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with Variance Fnnctions 
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Abstract 

The nested error regression model is a useful tool for analyzing clustered (grouped) data, and is 
especially used in small area estimation. The classical nested error regression model assumes nor¬ 
mality of random effects and error terms, and homoscedastic variances. However, these assumptions 
are often violated in real applications and more flexible models are required. This article proposes 
a nested error regression model with heteroscedastic variances, where the normality for the un¬ 
derlying distributions is not assumed. We propose the structure of heteroscedastic variances by 
using some specified variance functions and some covariates with unknown parameters. Under 
the setting, we construct the moment-type estimators of model parameters and some asymptotic 
properties including asymptotic biases and variances are derived. For predicting linear quantities 
including random effects, we suggest the empirical best linear unbiased predictors and the second- 
order unbiased estimators of mean squared errors are derived in the closed form. We investigate 
the proposed method with simulation and empirical studies. 

keywords and phrases: empirical best linear unbiased predictor; heteroscedastic variance; mean 
squared error; nested error regression; small area estimation; variance function 


1 Introduction 

Linear mixed models and the model-based estimators including empirical Bayes (EB) estimator or 
empirical best linear unbiased predictor (EBLUP) have been studied quite extensively in the literature 
from both theoretical and applied points of view. Of these, the small area estimation (SAE) is an 
important application, and methods for SAE have received much attention in recent years due to 
growing demand for reliable small area estimates. Eor a good review on this topic, see Ghosh and Rao 
(1994), Rao and Molina (2015), Datta and Ghosh (2012) and Pfeffermann (2014). The linear mixed 
models used for SAE are the Eay-Herriot model suggested by Eay and Herriot (1979) for area-level 
data and the nested error regression (NER) models given in Battese, Harter and Euller (1988) for 
unit-level data. Especially, the NER model has been used in application of not only SAE but also 
biological experiments and econometric analysis. In the NER model, a cluster-specific variation is 
added to explain the correlation among observations within clusters besides the noise, which allow the 
analysis to ‘borrow strength’ from other clusters. The resulting estimators, such as EB or EBLUP, for 
small-cluster means or subject-specihc values provide reliable estimates with higher precisions than 
direct estimates like sample means. 
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In the NER model with m small-clusters, let (yiij • • •, {Vini, be tii individual observations 
from the i-th cluster for i = 1,... ,m, where Xij is a p-dimensional known vector of covariates. The 
normal NER model proposed by Battese, et al. (1998) is given by 

Vij = x'ijfB + Vi + Eij, i = ,m, j = 

where Vi and denote the random effect and samping error, respectively, and they are mutually 
independently distributed as Vi ~ and Sij ~ N{0,a‘^). The mean of i/ij is x'-f3 for regres¬ 
sion coefficients /3, and the variance of yij is decomposed as Var(?/jj) = which is the same 

for all the clusters. However, Jiang and Nguyen (2012) illustrated that the within-cluster sample 
variances change dramatically from cluster to cluster for the data given in Battese, et al. (1988). 
Then, Jiang and Nguyen (2012) proposed the heteroscedastic nested error regression (HNER) model 
with the setup that variance Var(?/jj) is proportional to af, namely \ai{yij) = (A J- This is 

equivalent to the assumption that Var(ui) = Xa'f and Var(ejj) = af. Under this setup, Jiang and 
Nguyen (2012) assumed normality for Vi and £ij and demonstrated the quite interesting result that 
the maximum likelihood (ML) estimators of f3 and A are consistent for large m, which implies that 
the resulting EB estimator is asymptotically equivalent to the Bayes estimator. Thorough simulation 
studies, Jiang and Nguyen (2012) also showed that that the EBLUP from HNER model can improve 
the prediction accuracy over that from NER model when the data is generated from HNER model. 
However, there is no consistent estimator for the heteroscedastic variance af because of finiteness 
of Ui, and the mean squared error (MSE) of the EBLUP cannot be estimated consistently since it 
depends on af. To fix the inconsistent estimation of af, recently, Kubokawa, Sugasawa, Ghosh and 
Chaudhuri (2016) proposed the hierarchical model such that cr?’s are random variables and a~‘^ has 
a gamma distribution. The same dispersion structure was used in Haiti, Ren and Sinha (2014) who 
applied this hierarchical structure to the Fay-Herriot model (Fay and Herriot, 1979) with statistics for 
estimating af. Kubokawa, et al. (2016) proposed the ML estimators of model parameters including 
the shape and scale parameters in dispersion distribution of af. They also showed the consistency 
of the model parameters and constructed the second-order unbiased mean squared errors of MSE by 
using parametric bootstrap. 

While these two HNER models are useful for analyzing unit-level data with heteroscedastic vari¬ 
ances, the serious drawback of the two models is that both require normality assumption for random 
effects and error terms, which are not necessary satisfied in real application. Hence, the purpose of 
this paper is to address the issue of relaxing assumptions of classical normal NER models toward two 
directions: heteroscedasticity of variances and non-normality of underlying distributions. 

In real data analysis, we often encounter the situation where the sampling variance Var(ejj) is 
affected by the covariate Xij. In such case, the variance function is a useful tool for describing its 
relationship. Variance function estimation has been studied in the literature in the framework of 
heteroscedastic nonparametric regression. For example, see Cook and Weisberg (1983), Hall and 
Carroll (1989), Muller and Stadtmuller (1987, 1993) and Ruppert, Wand, Holst and Hossjer (1997). 
Thus, in this paper, we propose the use of the technique to introduce the heteroscedastic variances into 
the NER model without assuming normality of underlying distributions. The variance structure we 
consider is Vex (yij) = + namely, the setup means that the sampling error Sij has heteroscedastic 

variance Var(ejj) = afj. Then we suggest the variance function model given by afj = 
where the details are explained in Section [2j In terms of modeling the heteroscedastic variances with 
covariates, the generalized linear mixed models (Jiang, 2006) are also the useful tool. The small area 
models using generalized linear mixed models are investigated in Ghosh, Natarajan, Stroud and Carlin 
(1998). However, the generalized linear mixed model requires strong parametric assumption compared 
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to the heteroscedastic model without assuming underlying distributions proposed in this paper. Hence, 
the generalized linear mixed model seems still restrictive while it is an attractive method for modelling 
heteroscedasticity in variances. 

In this paper, we propose flexible and tractable HNER models without assuming normality for 
either Vi nor £ij. The advantage of the proposed model is that the MSE of the EB or EBLUP and 
its unbiased estimator are derived analytically in closed forms up to second-order without assuming 
normality for Vi and Eij. Nonparametric approach to SAE has been studied by Jiang, Lahiri and Wan 
(2002), Hall and Haiti (2006), Lohr and Rao (2009) and others. Most estimators of the MSE have 
been given by numerical methods such as Jackknife and bootstrap methods except for Lahiri and Rao 
(1995), who provided an analytical second-order unbiased estimator of the MSE in the Fay-Heriot 
model. Hall and Haiti (2006) developed a moment matching bootstrap method for nonparametric 
estimation of MSE in nested error regression models. The suggested method is actually convenient 
but it requires bootstrap replication and has computational burden. In this paper, without assuming 
the normality, we derive a closed expression for a second-order unbiased estimator of the MSE using 
second-order biases and variances of estimators of the model parameters. Thus our MSE estimator 
does not require any resampling method and is convenient in practical use. Also our MSE estimator 
can be regarded as a generalization of the robust MSE estimator given in Lahiri and Rao (1995). 

The paper is organized as follows: A setup of the proposed HNER model and estimation strategy 
with asymptotic properties are given in Section [2j In Section [3l we obtain the EBLUP and the second- 
order approximation of the MSE. Further, we provide the second-order unbiased estimators of MSE 
by the analytical calculation. In Section 01 we investigate the performance of the proposed procedures 
through simulation and empirical studies. All the technical proofs are given in the Appendix. 

2 HNER Models with Variance Functions 

2.1 Model settings 

Suppose that there are m small clusters, and let (j/ii, ®ii)) • • • j {Vini-,Xirn) be the pairs of rii observations 
from the Lth cluster, where Xij is a p-dimensional known vector of covariates. We consider the 
heteroscedastic nested error regression model 

Vij = AjP + Vi + £ij, j = l,...,ni, i = ( 1 ) 

where /3 is a p-dimensional unknown vector of regression coefficients, and Vi and £ij are mutually 
independent random variables with mean zero and variances Var(uj) = and Var(ejj) = afj, which 
are denoted by 

Vi ~ ( 0 ,r^) and Sij ~ ( 0 ,cr|,). ( 2 ) 

It is noted that no specific distributions are assumed for Vi and Sij. It is assumed that the heteroscedas¬ 
tic variance afj of Eij is given by 

i = I,..., m, (3) 

where Zij is a g-dimensional known vector given for each cluster, and 7 is a ^-dimensional unknown 
vector. The variance function <t^(-) is a known (user specified) function whose range is nonnegative. 
Some examples of the variance function are given below. The model parameters are /3, and 7 , and 
the total number of the model parameters is p J- 5 J- 1 . 
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Let = {yn,... ,yinj, Xi = (a;*!,...,and e, = (ea, ■ ■ ■ ,emj'- Then the model ([I]) is 
expressed in a vector form as 


yi = XiP + Vilni+ ei, i = 


where is an n x 1 vector with all elements equal to one, and the covariance matrix of Cj is 

Sj = Var(yJ = r^Jni + Wi, 

for = Iriil'm Wi = diag((T?j^,... ,cri^.). It is noted that the inverse of Sj is expressed as 





r^Jn.W-^ \ 

1 + r^E^i-^V’ 


where VL/ = diag(cjE,---,f^in!)- 
(ei>--->em)'and'n = ..., 

where Var(y) = S = block diag(Si,. 
a 2 (^i. 7 ) in ©. 


Further, let y = {y[,... ,y'^y, X = {X'^^,..., X'^)\ e = 
. Then, the matricial form of ([T|) is written asy = XP+v+e, 
.., 'Sm)- Now we give three examples of the variance function 


(a) In the case that the dispersion of the sampling error is proportional to the mean, it is reasonable 

to put Zij = and for a sub-vector of the covariate Xij. For 

identifiability of 7 , we restrict 71 > 0 . 

(b) Consider the case that m clusters are decomposed into q homogeneous groups Si,... ^Sq with 

m} = S'! U ... U Sq. Then, we put 

-^7 = (l{ieSi}5 • • •) IfieSg}) 5 


which implies that 

= It for St. 

Note that Var(yjj) = for i G St. Thus, the models assumes that the m clusters are divided 

into known q groups with their variance are equal over the same groups. Jiang and Nguyen (2012) 
used a similar setting and argued that the unbiased estimator of the heteroscedastic variance is 
consistent when |5fc|—)-oo,A: = l,...,gasm—)-oo, where \Sk\ denotes the number of elements 
in 5fc. 


(c) Log linear functions of variance were treated in Cook and Weisberg (1983) and others. That is, 
logcr?^' is a linear function, and afj is written as = exp(zC 7 ). Similarly to (a), we put 

Zij ^[s)ij‘ 

For the above two cases (a) and (b), we have (j‘^{x) = x^, while the case (c) corresponds to 
log{(T^(x)} = X. In simulation and empirical studies in SectionHJ we use the log-linear variance model. 
As given in the subsequent section, we show consistency and asymptotic expression of estimators for 
7 as well as P and r^. 
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2.2 Estimation 


We here provide estimators of the model parameters /3, and 7 . When values of 7 and are given, 
the vector (3 of regression coefficients is estimated by the generalized least squares (GLS) estimator 

( m \ m 

(4) 

i=l / i=l 


This is not a feasible form since 7 and are unknown. When estimators and 7 are used for 
and 7 , we get the feasible estimator (3 = 13(9'^,^) by replacing and 7 in /3 with their estimators. 

Concerning estimation of r^, we use the second moment of observations yij^s. From model ([T]), it 
is seen that 

E [{Vij - x[j/3f] =T^ + (5) 

Based on the ordinary least squares (OLS) estimator /Sqls ~ ^ moment estimator of 

is given by 

mm 

^ X] - x'ijPoi^sf - (Ajl )} > (6) 

i=l j=l 

with substituting estimator 7 into 7 , where N = 

For estimation of 7 , we consider the within difference in each cluster. Let y* be the sample mean 
in the i-th cluster, namely y* = Vij- noted that for Ei = 

Vij Hi — {Xij ajj) [3 T {Sij £i)) 


which dose not include the term of Vi. Then it is seen that 


E 


{Vij -Vi- {x^j - x^yp}^] = (1 - 2n/) ^ V (y^{z\ypi), 


h=l 


which motivates us to estimate 7 by solving the following estimating equation given by 

m rii f m 

N 


^ m m r 2 m ' 

vEE { Vij -Vi- {xij - a;i)'3oLs} - (1 - 2ni (y^{z\p) - 


i=l j=l . 
which is equivalent to 


h=l 


^ij — 0, 


m rii 

vEE 

1=1 j=i 

where Zi = ny^YA=i^V‘ noted that, in the homoscedastic case with = 6^, the 

estimators of (5^ and reduce to the estimators identical to the Prasad-Rao estimator (Prasad and 
Rao, 1990) up to the constant factor. 

Note that the function given in the left side of d?]) does not depend on /3 and and the estimator 
of does not depend on /3 but on 7 . These suggest the simple algorithm for calculating the estimates 
of the model parameters: We first obtain the estimate 7 of 7 by solving ([7]), and then we get the 
estimate from ([ 6 ]) with 7 = 7 . Finally we have the GLS estimate P with substituting 7 and in 

(SD- 


{vij -Vi - {xij - ®i)'3oLs} ^ij - - 2n. ^Zij + ^Zi) 


= 0 (7) 
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2.3 Large sample properties 

In this section, we provide large sample properties of the estimators given in the previous subsection 
when the number of clusters m goes to infinity, but n^’s are still bounded. To establish asymptotic 
results, we assume the following conditions under m —)• oo. 


Assumption (A) 


(Al) There exist bounded values n and n such that n < rii <n for i = 1,... ,m. The dimensions p 
and q are bounded, namely p,q = 0(1). The number of clusters with one observation, namely 
rij = 1, is bounded. 

(A2) The variance function is twice differentiable and its derivatives are denoted by 
and respectively. 

(A3) The following matrices converge to non-singular matrices: 


m rii 

i=l j=l 


lit I Li 

2=1 j = l 


for oi = 1, 2 and 02 = —1,0,1. 

(A4) < 00 and < 00 for 0 < c < 1. 

(A5) For all i and j, there exist 0 < ci, ci < 00 and bounded values cg, C 2 such that ci < < ci 

and £2 < < C 2 with A: = 1, 2 on the neighborhood of the true values. 


The conditions (Al) and (A3) are the standard assumptions in small area estimation. The condition 
(A2) is also non-restrictive, and the typical variance functions cr^(x) = and (T^(x) = expx obviously 
satisfy the assumption. The moment condition (A4) is used for deriving second-order approximation 
of MSE of the EBLUP discussed in Section [3l and it is satisfied by many continuous distributions, 
including normal, shifted gamma, Laplace and t-distribution with degrees of freedom larger than 9. 
The three examples given in Section [2.11 satisfy the condition (A5). 

In what follows, we use the notations 






ij[k) 


for simplicity. To derive asymptotic approximations of the estimators, we use the following notations 
in the i-th. cluster: 


Uli = 


m 

N 


rii 


{{Vij - , 


U2i = 


m 

N 


n-i 


Y [{y^j - Vi - i^ij - + ^2 


j=l 


with 


( 8 ) 

(9) 


m rtk 


m rik 


'^^h) = YY ^kh{l)^kh, ^2(7) — EE ^kh(l)i^kh 2 n^ Zkh np. Zk)zp.f^ 

k=lh=l 


( 10 ) 
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Note that 2 ^ 1 ( 7 ) = 0{m) and 2 ^ 2 ( 7 ) = 0{m under Assumption (A). Then we obtain the asymp¬ 
totically linear expression of the estimators. 


Theorem 1. Let 9 = (/ 3 , 7 ^,r^)' be the estimator of 6 = (/ 3 ', 7 ',r^)h Under Assumption (A), it 
holds that 9 — 6 = with the asymptotically linear expression 

^ m 

i=l 


where 


V’f = m ^ XfL:f\y,-Xi(3), = NT 2 {l)u 2 i, = uu - Tl{-f)'T 2 {l)u 2 ^. 


From Theorem dl it follows that m^^‘^{9 — 9) has an asymptotically normal distribution with mean 
vector 0 and covariance matrix mUt, where ft is a (p-|-g-|-l) x (p + q + 1) matrix partitioned as 


( mUtf3f3 mUlp-y mUtfir 
mQ-yy mfl-yr 
mUl'p^ mUlrT 


lim — 

m—>-00 m 


E 


/ \ 

\ E[f^m ) 


It is noticed that E[uii{yij — x'-fd)] = 0 and E[u 2 i{yij — x[j(3)] = 0 when yij are normally distributed. 
In such a case, it follows Ul/B-y = 0 and = 0, namely fd and cf) = ( 7 ',t^)' are asymptotically 
orthogonal. However, since we do not assume the normality for observations ytj's, fd and cf are not 
necessarily orthogonal. 

The asymptotic covariance matrix mfi or can be easily estimated from samples. For example, 
mUlpfi = limm->.cx) El'ff'ff ] can be estimated by 


llh . 

1=1 


where 'ipf is obtained by replacing unknown parameters 9 in ipf with estimates 9. It is noted that the 
accuracy of estimation is given by 

= Eti3i3 + Op{m~^), 

from Theorem [ 1 ] and Ul = 0{m~^). The estimator Ul will be used to get the estimators of mean 
squared errors of predictors in Section [3l 

We next provide the asymptotic properties of conditional covariance matrix given in the following 
corollary where the proof is given in the Appendix. 

Corollary 1. Under Assumption (A), for i = 1,... ,m, it follows that 


E({9-9)i9-9y yA=^l + ciyMm 




( 11 ) 


where c{yj) is the fourth-order function of y^, so that E\c{yi)\ < 00 under Assumption (A). 
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This property is used for estimation and evaluating the mean squared errors of EBLUP discussed 
in the subsequent section. Moreover, in the evaluation of the mean squared errors of EBLUP and the 
derivation of its estimators, we need to obtain the conditional and unconditional asymptotic biases of 
the estimators 6. 

Let and br\yi) be the second-order conditional asymptotic biases defined as 

- P\yi] =hf{yi) + Op(m"^), E[^ - 7 ]^^] = + Op(m"^), 


In the following theorem, we provide the analytical expressions of bi^\y^) and br\yi). Define 

bi 3 , b-y and br by 


bp = {X'^-^X) ' 



k=l 




and 


&7 = ^ 2 ( 7 ) 


2 ^ col {tr {EkZkrEkXk [Uqls^'a: - (X'X)-^X^l^k]) 


k=l 


m rik 

-EE 2n^ Zkj + rij^ Zfc) , 

k=l j=i 


( 12 ) 


br 


1 

N 


m Uk 

EE ^kj{l)^jkb'y 


k=lj=l 


2 

- J]tr{(X'X)”iX',SfcX4 


1 

m 


m rif^ 

EE ^kj^ll^kj + 

k=l j=l 


- lit 

-^tr(X',XfcUoLs) 


where E^ = In^ '^OLS = (X'X) ^X''EX{X'X) \ Z^r = dia.g{zkir, ■ ■ ■ ^Zhn^r) for r-th 

element Zkjr of z^j, ^( 3 *a for a G {r, 71 ,, 7 ^} and VUj(s) are defined in the proof of Theorem [51 and 
collar-},, denotes a g-dimensional vector (oi,..., aq)'. It is noted that 6 ^, b-y, br are of order 0{m~^). 
Now we provide the second-order approximation to the conditional asymptotic bias. 


Theorem 2. Under Assumption (A), we have 

bfiy,) = (X'E-iX)"' - X^P) + b^, 6«(yJ = T2(7)«2i + ^ 

6W(yr) = m~^uii - m“^Ti(7)'T2(7)it2i + fo, 


where b^^\yP, b^\yP and br\y,i) are of order Op{m ^), and uu and U 2 i are given in dH) and (|^. 
respectively. 

From the above theorem, we immediately obtain the unconditional asymptotic bias of the estima¬ 
tors 6 by taking expectation with respect to y, given in the following Corollary. 




Corollary 2. Under Assumption (A), it holds that 

E[d-e] = {b'f„U^,bry + o{m-^), 
where bf^, b~^ and bj- are given in (| 12 p . 


3 Prediction and Risk Evaluation 


3.1 Empirical predictor 


We now consider the prediction of 

Hi = c[f3 + Vi, 

where c, is a known (user specified) vector and Vi is the random effect in model ([T]). The typical choice 
of Cj is Cj = Xi which corresponds to the prediction of mean of the i-th cluster. A predictor Jl{yi) of Hi 
is evaluated in terms of the MSE E[(pt(yj) — Hi)‘^]- In th® general forms of niVi), the minimizer (best 
predictor) of the MSE cannot be obtain without a distributional assumption for Vi and £ij. Thus we 
focus on the class of linear and unbiased predictors, and the best linear unbiased predictor (BLUE) 
of Hi in terms of the MSE is given by 

Hi = c'/3 + - Xip). 


This can be simplified as 


rii 

Hi = c'/3 + Xij {yij - x[jp) , 
i=i 


where Xij = T'^cr-'^r]- ^ for 77 , = 1 + Ylh=i^ih ■ I^ homogeneous variances, namely 

afj = (5^, it is confirmed that the BLP reduces to Hi = c'/3 + Xi {iji — x[P) with A, = niT'^{6‘^ + 
as given in Hall and Maiti (2006). The BLUE is not feasible since it depends on unknown parameters 
P, 7 and T^. Plugging the estimators into /Ij, we get the empirical best linear unbiased predictor 
(EBLUP) 

Ui 

Hi = c[p + Y % (vij - x'ijp) > ^ij = (14) 

i=i 


for 77 j ^ = 1 + ^ih ■ In the subsequent section, we consider the mean squared errors (MSE) of 

EBLUP (I14|) without any distributional assumptions for Vi and £ij. 


3.2 Second-order approximation to MSE 

To evaluate uncertainty of EBLUP given by m, we evaluate the MSE defined as MSEj((^) = 
E [{Hi — Hi)'^] lor cj) = ( 7 ^^■^)h The MSE is decomposed as 

MSEj(0) = E [{Hi -Hi + Ui- Ui)"^] 

= E [{Hi - Hif] + E [{Hi - Hif] + 2E [{pi - Hi){V'i - h-i)] ■ 

From the expression of pi^ we have 

( rii \ Ui 

'y ^ Ajj 1 1 Vi A ^ [ Xij £ij , 
i=i / i=i 
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which leads to 


Rii{(i)) = E [(/ij - ^if] = X] ^■ 


(15) 


vi=i 


i=i 


For the second term, however, we cannot obtain an exact expression, so that we derive the ap¬ 
proximation up to 0{m~^). Using the Taylor series expansion, we have 


Hi- Hi = 




0 = 0 * 


( 0 - 0 ), 


(16) 


where 0* is on the line between 0 and 0. The straightforward calculation shows that 

rii rii rii 

d(3 ^ “ 


— ^ , o — 'k 

J=1 ^ i=i 




(17) 

i=i 


where 


c 4 -4 2 2 -2 2 

"d ~ z^^ih ^ih{l)^ih ~ 


h=l 


Then each element in Hi/dOdO' is a linear function of y^. Hence under Assumption (A), using the 
similar arguments given in Lahiri and Rao (1995), we can show that 


E [{Hi - Hi)‘^] = R 2 i{(t}) + o{m ^), 
where the detailed proof is given in the Appendix, and 


(18) 


rii \' I 

'' rii 

'^^ij ^0 1 ^77 ( 


/ ' 


rii 

rii 

yy ^ij ^ij^lT + Hi 


II 

i=i 


rii 


i=i 


(19) 

which is of order 0{m~^). All the evaluations of the residual terms appeared in this paper can be 
done by the similar manner, and detailed proofs will be omitted in what follows. 

We next evaluate the cross term E [(/2j — Hi){Hi — Hi)]- This term vanishes under the normality 
assumptions for Vi and £ij, but in general, it cannot be neglected. As in the case of R 2 i, we obtain an 
approximation of E [{Hi — Hi){Hi — Hi)] lo 0{m~^). Noting that 


Hi- Hi = 

and using the expansion (fTBI) . we obtain 

' d Hi 


^ij 1 1 ^ij^ij — 


= Wi 


E [{Hi - Hi)iHi - Hi)] = E 


de 


ki=i 


(0 - 6)wi 


i=i 




(g - ey 


f d^Hi 

\ 0000 ' 


0 = 0 * 


(0 - 6)wi 
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Using the expression of ()17p and Corollary [H the straightforward calculation (whose details are given 
in the Appendix) shows that 


R32i{4>) = 


{e-ey 


/ 

V dOde' 


0 = 0 ‘ 


{6 - 0 )wi 


= o{m ), 


under Assumption (A). Moreover, from Theorem [2l we obtain 


E 




= Rsiiicf), k) + o{m 


for 


rii 


m rxfc 


R3u{(p,i‘^) = rii Z] Z Z ^kh{l)^khZ'kh M2ij{ct), k) 


i=l 


\k=l h=l 


+ m-S"' Z^d 1 - Ti(7)'T2(7)M2i, (<^, k) [, 

i=i 


( 20 ) 


where 

Miijicf), k) = miV“V^r/“^|nir^(3 - Ky) + al^{Ke - 3)| 

AT2ij(0,K) = mN-^T‘^r]~^n~^{ni - 

and Ky, Ke are defined as E{vf) = and E(efj) = kiecrfj, respectively, and k = (K^,Ke)'. The 

derivation of the expression of R 3 ii{cf),K) is also given in the Appendix. From the expression (|20l) . it 
holds that ii 3 ij( 0 , ft) = 0 {m~^). 

Under the normality assumption of Vi and £ij, we immediately obtain Muj = 0 and Af 2 ij = 0 
since k = (3,3)h This leads to R 31 = 0, which means that the cross term does not appear in the 
second-order approximated MSE, that is our result is consistent to the well-known result. 

Now, we summarize the result for the second-order approximation of the MSE. 

Theorem 3. Under Assumption (A), the second-order approximation of the MSE is given by 

MSEi(0) = Riiicf)) + i? 2 i( 0 ) + 2i?3ii(0, k) -b o(m“^), 

where Rii{4>), R 2 i{ 4 >) o,nd R 3 ii{cj>, n) are given in (fTKIl . (fT^ and (l20l) . respectively, and Ru{cf)) = 0(1), 
R2i{4>) = 0 {m~^) and i? 3 ij( 0 , ft) = 0 {m~^). 

The approximated MSE given in Theorem [3] depends on unknown parameters. Thus, in the 
subsequent section, we derive the second-order unbiased estimator of the MSE by the analytical and 
the matching bootstrap methods. 

3.3 Analytical estimator of the MSE 

We first derive the analytical second-order unbiased estimator of the MSE. Erom Theorem [3l R2i{4>) 
is so that it can be estimated by the plug-in estimator i? 2 i(<^) with second-order accuracy, 

namely E[R 2 i{ct>)] = R 2 i{cj)) -U o{m~^). Eor R^ii[<f>,K) with order if a consistent estimator k 

is available for k, this term can be estimated by the plug-in estimator with second-order unbiasedness. 
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To this end, we construct a consistent estimator of k using the expression of fourth moment of 
observations. The straightforward calculation shows that 


rii 


^ ^ {Vij Vi i^ij 




Ui 


= KeU^ - l)(ni - 2){nf -m-l) afj j + ^{2ni - 3) < 

whereby we can estimate by 


rii 


E 4 -E 


vi=i 


i=i 


, 


Kr = 


^ m 

— T 

JSf* 


E{ 

i=i 


Vij Ui [p^ij ® 


i=l 

,-4/„ n\l^2 

in Section 2. For it is observed that 

E 


,)'3}"-3ny(2n,-3)i [ |] 


IT. 




V J = 1 


rii 

-E- 

i=i 




( 21 ) 


where N* = ^{rii — l)(nj — 2)(n^ — n* — 1) X)J=i ^tj and (3 is the feasible GLS estimator of (3 given 


[Vij - ^ij^T = + KeO-y 




which leads to the estimator of Ky given by 

^ m Ui 

K,i) — 


. m Tii 

Lee 


N9^ 


^ 2^2 ^ ^4 

— or an — K^a, 






( 22 ) 


Vij ~ ®3^0Ls) 

i=l j=l 

From Theorem [U it immediately follows that the estimators given in (j21jl and (j22jl are consistent. 
Using these estimators, we can estimate R^u by R^ii{(f),'k) with second-order accuracy. 

Finally, we consider the second-order unbiased estimation of Ru. The situation is different than 
before since Ru = 0 ( 1 ), which means that the plug-in estimator Ru^) has the second-order bias 
with 0{m~^). Thus we need to obtain the second-order bias of i?ij(0) and correct them. By the 
Taylor series expansion, we have 

dRu{cf))' 


Ru{cf)) = Riiicf}) -h 


dcf)' 


(0 - 0) + - 0)' ((0 - 0) + Op(ll0 - 0f) 


\ 0090' 


Then, the second-order bias of Rii{4>) is expressed as 

E[RiS)]-Rii{(l}) 

'd'^Rii{(j}) 


(dRu{ct^)\ , 1 . 


d(f)d(f)' 


E 


( 0 - 0 )( 0 - 0 )' \ +o{m 0 


dRii{cj)) 

00 ' 


, 1 
H 


d‘^Rii{cj)) \ 
0000 ' 


o{m L 


where is the sub-matrix of Cl with respect to cf), and 6 ,^ is the second-order bias of 0 given in 
Corollary [5J The straightforward calculation shows that 


0i?lj(0) ^ _ 2 -2 0^fin(0) ^ 2r -3 _ -2', 

^ ^*(1)’ Q^2Q^2 


0 i?li( 0 ) 2 

5^^100) o -3 d‘^Rli{(t>) 2-3^0 / ^ 

= =ril, {2v^il)V^il)-V^Vii2)), 
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where 


r\ •I'l rAO 

arfi 2 -4 2 ^ 2 ^^/^ - 

^*(1) = 7k = "^ 2^%' ^*(2) = = ^ 2^ (,2%- 

j=i ' i=i 


-2 4 2 

- O'- V 


jL^ Uirj V’ Q-^Q-~i' 

Therefore, we obtain the expression of the second-order bias given by 

Bi{(f)) = - - 277“377'(i)^7r + - r]~^)^ 

+ I Vi(i)i^ 77 ^i(i) - (77^(2) I , 






( 23 ) 


with Bi{(f>) = 0{m ^). Noting that Bi{(p) can be estimated by Bi{cj}) with E[Bi{(f))] = Bi{cf)) + o{m 
from Theorem [H we propose the bias corrected estimator of Ru given by 

= RuA^) - bS), 


which is second-order unbiased estimator of i?ii, namely 

E[R^Sf"]=Ru{4}) + o{m-^). 


Now, we summarize the result for the second-order unbiased estimator of MSE in the following theorem. 
Theorem 4. Under Assumption (A), the second-order unbiased estimator o/MSEj is given by 

= ThS)^^ + R 2 S) + 2i?3n($, /5), 


that is, E 


MSEi 


MSEi+ o(m-i). 


It is remarked that the proposed estimator of MSE does not require any resampling methods 
such as bootstrap. This means that the analytical estimator can be easily implemented and has less 
computational burden compared to bootstrap. Moreover, we do not assume normality of Vi and £ij in 
the derivation of the MSE estimator as in Lahiri and Rao (1995). Thus the proposed MSE estimator 
is expected to have a robustness property, which will be investigated in the simulation studies. 


4 Simulation and Empirical Studies 

4.1 Model based simulation 

We hrst compare the performances of EBLUP obtained from the proposed HNER with variance 
functions (HNERVE) with several existing models in terms of simulated mean squared errors (MSE). 
We consider the conventional nested error regression (NER) model, heteroscedastic NER model given 
by Jiang and Nguyen (2012) referred as JN, and the heteroscedastic NER with random dispersions 
(HNERRD) proposed in Kubokawa, et al. (2016). In applying the NER model, we use the unbiased 
estimator for variance components given in Prasad and Rao (1990) to calculate EBLUP. Purther, we 
also consider the following log-link gamma mixed (GM) models as the competitor from the generalized 
linear mixed models, which also allows heteroscedasticity for the variances as the quadratic function 
of means. We used glmer function in lme4 package in ‘R’ to apply the GM model. 
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In this simulation study, we set m = 20 and = 8 in all cases, and we compute the simulated 
MSE in 10 scenarios denoted by SI,... ,S10. The simulated MSE for some area-specific parameter /i, 
is define as 

= (24) 

r=l 

where R = 5000 is the number of simulation runs, /i) ^ is the predicted value from some models and 
/i) ^ is the true values in the r-th iteration. In all scenarios, we generate covariates Xij's from the 
uniform distribution on (0,1), which are fixed in simulation runs. Erom SI to S3, we consider the 
heteroscedastic model with area-level heteroscedastic variances given by 

SI ~ S3 : Vij = /3o -t- I3ixij + Vi + Sij, Vi ~ (0, r^), £ij ~ (0, af), yn = Pq + Vi, 

where af = exp(0.8 — Zi) and (/3o,/3i,t) = (1,0.5,1.2). We generate zPs from uniform distribution on 
(—1,1), which are fixed in simulation runs. The scenarios SI, S2 and S3 correspond to the cases where 
the distributions of both Vi and £ij are normal, t with 6 degrees of freedom, and chi-squared with 
5 degrees of freedom, respectively, noting that both t-distribution and chi-squared distribution are 
scaled and located to meet the specified means and variances. For S4, we consider the homoscedastic 
model given by 

S4 : yij = Pq + PiXij + Vi + £ij, Vi ~ iV(0, r^), £ij ~ iV(0, ct^), = /3o -h Uj, 

with (/3o,/3i,T,( t) = (1,0.5,1.2,1.5). In S5 and S6, we use the heteroscedastic model with unit-level 
heteroscedastic variances given by 

S5, S6 : yij = Po + PiXij + Vi + £ij, Vi ~ iV(0, r^), £ij ~ A^(0, afj), = Pq + Vi, 

where afj = exp(0.8 — Zij) in S5 and ~ r(5,5/ exp(0.8 — Zij)) in S6. For S7 and S8, we consider 
the mixed model of the form 

S7, S8 : yij = exp(/3o -h PiXij + Vi)£ij, yi = exp(/3o -h Vi), 

where Vi ~ N{0,t'^), £ij ~ r(3,3) and (/3o,/3i,r) = (0.5,1,0.3) in S7, and Vi ~ tQ{0,T^), £ij ~ 
SLN{l,a'^), and {Po, Pi,T,a) = (1.2,0.6,0.4,0.4) in S8, noting that tQ{a,b) denotes the t-distribution 
with 6 degrees of freedom with mean a and variance b and SLN{a, b) denotes the scaled log-normal 
distribution with mean a and variance b. Hence, S7 corresponds to the gamma mixed model with 
log-link function and S8 corresponds to its misspecified version. Finally, S9 to SIO are the mixed 
models defined as 

S9 : yij = iPo +I3ixij+ Vif£ij, Vi ~ 5LA^(1, cj^), yi = {Po + Vi)‘^ 

with {Pq, Pi,T,a) = (1,0.6,1.5,0.5), and 

SIO : yij = {exp(/3o PiXij) + Vi}£ij, Vi ~ iV(0,r^), £ij ~ 5LiV(l,a^), yi = exp(/3o) -h Vi, 

with {Pq, Pi,T,a) = (1,0.3,1.2,0.5). It is noted that both S9 and SIO are also heteroscedastic model 
in the sense that Var(?/jj) depends on Xij. 

Under the 10 scenarios described above, we compute the simulated MSE values of predictors from 
five methods (HNERVF, HNERRD, NER, JN and GM) in each area. Since we can apply GM only to 
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the data with positive y^/s, the MSE values of GM model are calculated from S7 to SIO. In Tabled! we 
show the mean, max and min values of MSE over all areas for each model and scenario. From SI to S3, 
it is observed that HNERVF performs better than the other models, and NER model performs worst 
since the true model is heteroscedastic. In S4, NER model performs best among four models since 
NER model is the true model and other HNER models are overfitted. It is also interesting to point 
out that the inefficiency of the prediction of JN is more serious than that of HNERVF and HNERRD. 
As in S5 and S 6 , the heteroscedastic variances are unit-level, the amount of improvement of HNERVF 
over other models gets greater. The scenario S7 corresponds to GM model, so that it is reasonable 
that MSE of GM is smallest among five models. The scenario S 8 is not GM model but it is still close 
to GM model, in which GM model works well compared to the other models. However, once GM is 
seriously misspecihed as in S9 and SIO, GM does not work very much because of its somewhat strong 
parametric assumption. From S 8 to SIO, all models are misspecihed, but HNERVF model works well 
compared to other models. Therefore, it is natural that HNERVF performs best when HNERVF is 
the true model, but even in case that HNERVF is misspecihed, HNERVF also works reasonably well 
owing to its hexible structure of the model. 

Table 1: Simulated Values of MSE for Various Scenarios and Models 



model 

SI 

S2 

S3 

S4 

S5 

S 6 

S7 

S 8 

S9 

SIO 


HNERVF 

0.368 

0.370 

0.371 

0.311 

0.280 

0.293 

0.269 

0.619 

0.198 

0.376 


HNERRD 

0.383 

0.383 

0.387 

0.310 

0.341 

0.379 

0.285 

0.641 

0.259 

0.369 

mean 

NER 

0.398 

0.405 

0.410 

0.307 

0.342 

0.384 

0.375 

0.726 

0.220 

0.384 


JN 

0.386 

0.392 

0.396 

0.324 

0.357 

0.392 

0.292 

0.684 

0.318 

0.385 


GM 

— 

— 

— 

— 

— 

— 

0.130 

0.451 

0.231 

0.396 


HNERVF 

0.598 

0.633 

0.569 

0.340 

0.354 

0.469 

0.342 

1.511 

0.299 

0.435 


HNERRD 

0.630 

0.634 

0.603 

0.342 

0.424 

0.523 

0.405 

1.603 

0.415 

0.419 

max 

NER 

0.642 

0.639 

0.596 

0.339 

0.423 

0.526 

0.518 

1.992 

0.336 

0.439 


JN 

0.634 

0.643 

0.618 

0.372 

0.445 

0.545 

0.426 

1.834 

0.532 

0.441 


GM 

— 

— 

— 

— 

— 

— 

0.149 

0.970 

0.372 

0.473 


HNERVF 

0.138 

0.145 

0.150 

0.272 

0.202 

0.196 

0.205 

0.398 

0.142 

0.297 


HNERRD 

0.156 

0.157 

0.166 

0.272 

0.254 

0.255 

0.219 

0.408 

0.142 

0.302 

min 

NER 

0.173 

0.177 

0.202 

0.269 

0.256 

0.256 

0.286 

0.442 

0.152 

0.305 


JN 

0.157 

0.160 

0.166 

0.288 

0.273 

0.256 

0.220 

0.414 

0.168 

0.314 


GM 

— 

— 

— 

— 

— 

— 

0.104 

0.335 

0.168 

0.309 


4.2 Finite sample performances of the MSE estimator 

We next investigate the hnite sample performances of the MSE estimators given in Theorem 01 To 
this end, we consider the data generating process given by 

Vij = ^0 + +Vi + Eij, Vi ~ (0, T^), Eij ~ (0, exp(7o -h 'jiZij)) 

with /3o = 1, /3i = 0.8, r = 1.2, 70 = 1 and 71 = —0.4. Moreover, we equally divided m = 20 areas into 
5 groups (G = 1,..., 5), so that each group has 4 areas and the areas in the same group has the same 
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sample size no = G + 3. Following Hall and Haiti (2006), we consider five patterns of distributions 
of Vi and Eij, that is , Ml: Vi and £ij are both normally distributed, M2: Vi and £ij are both scaled 
t-distribution with degrees of freedom 6 , M3: Vi and £ij are both scaled and located Xb distribution, 
M4: Vi are £ij are scaled and located X5 and —Xb distribution, respectively, and M5: Vi are £ij are both 
logistic distribution. The simulated values of the MSE are obtained from (j24l) based on i? = 10,000 
simulation runs. Then, based on R = 5 ,000 simulation runs, we calculate the relative bias (RB) and 
coefficient of variation (CV) of MSE estimators given by 

1 ^ M^P-MSE, 

* R ^ MSEj ’ * R ^ 1 MSEi I 

r=l r=l y J 

^ (’") 1—1 

where MSE^ is the MSE estimator in the r-th iteration. In Table [21 we report mean and median 

values of RBj and CVj in each group. Eor comparison, results for the naive MSE estimator, without 
any bias correction, are reported in Table [2| as RBN. The naive MSE estimator is the plug-in estimator 
of the asymptotic MSE (IlSp . namely it is obtained by replacing and 7 in formula (jlSp by and 7 , 
respectively. In Tabled the relative bias is small, less than 10% in many cases. When the underlying 
distributions leave from normality, the MSE estimator still provides small relative bias although it 
has higher coefficient of variation. The naive MSE estimator is more biased than the analytical MSE 
estimator in all groups and models, so that the bias correction in MSE estimator is successful. 

Table 2: The Mean Values of Percentage Relative Bias (RB) and Coefficient of Variation (CV) of MSE 
Estimator and Relative Bias of Naive MSE Estimator (RBN) in Each Group. 


Group 

Measure 

Ml 

M2 

M3 

M4 

M5 


RB 

-8.72 

-12.50 

- 10.86 

-11.51 

-11.81 

Gi 

CV 

17.48 

23.60 

23.47 

23.40 

21.24 


RBN 

-12.67 

-13.74 

-13.10 

-13.57 

-13.39 


RB 

-7.61 

-9.72 

-10.58 

-10.57 

-7.27 

G2 

CV 

17.52 

23.24 

22.70 

23.03 

20.31 


RBN 

-10.16 

- 12.66 

-11.48 

-11.33 

-10.54 


RB 

-7.89 

-8.39 

-7.65 

-8.92 

-6.34 

G3 

CV 

19.85 

26.05 

24.66 

25.37 

22.94 


RBN 

-9.31 

-9.43 

-8.70 

-9.86 

-7.58 


RB 

-6.52 

-4.74 

-4.96 

-5.65 

-4.27 

Gi 

CV 

22.02 

28.37 

26.93 

27.68 

24.98 


RBN 

-10.83 

-7.68 

-7.98 

-6.52 

-6.42 


4.3 Real data application 

We now apply the HNERVE model together with HNERRD, NER, JN and CM models considered in 
the simulation study in Section[4T]to the data which originates from the posted land price (PEP) data 
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along the Keikyu train line in 2001. This train line connects the suburbs in the Kanagawa prefecture 
to the Tokyo metropolitan area. Those who live in the suburbs in the Kanagawa prefecture take this 
line to work or study in Tokyo everyday, so that it is expected that the land price depends on the 
distance from Tokyo. The PLP data are available for 52 stations on the Keikyu train line, and we 
consider each station as a small area, namely, m = 52. For the i-th station, data of n, land spots are 
available, where nt varies around 4 and some areas have only one observation. 

For j = 1,..., nj, yij denotes the scaled value of the PLP (Yen/10,000) for the unit meter squares 
of the j-th spot, Tj is the time to take from the nearby station i to the Tokyo station around 8:30 in the 
morning, Dij is the value of geographical distance from the spot j to the station i and FARij denotes 
the floor-area ratio, or ratio of building volume to lot area of the spot j. The three covariates FARij, 
Ti and Dij are also scaled by 100,10 and 1000, respectively. This data set is treated in Kubokawa, et 
al. (2016), where they pointed out that the heteroscedasticity seem to be appropriate from boxplots 
of some areas and Bartlett test for testing homoscedastic variance. They used the PLP data with 
log-transformed observations, namely log yij, but we use yij in this study since the results are easier 
to interpret than the results from log yij. In the left panel of Figured! we show the plot of the pairs 
{Dij,eij), where Cjj is OLS residuals dehned as 

Cjj = yij - {f3o,OLS + FARijPi^oLS + Ti/32,oLS + Dijfi^^oLs)- 

The hgure indicates that the residuals are more variable for small Dij than for large Dij, and the 
variances are exponentially decreasing with respect to Dij. Thus we apply the HNERVF model with 
the exponential variance function given by 

Vij = /do + FARij (3i + Tif32 + DijjS^ + Vi + Sij, (25) 

where Vi ~ (0,r^) and Sij ~ ( 0 ,exp( 7 o -|- yiDjj)). To compare the results, we also apply HNERRD, 
NER, JN and GM models to the PLP data with the same covariates. In applying NER model, we 
regard it as the submodel of HNERVE by putting 71 = 0 and use the same estimating method with 
HNERVE. The estimated regression coefficients from five models are given in the Table [3l We first 
note that the conditional expectation of the GM model is exp(/3o -|- FARijj3i + Tif32 + Dijfi^, + Vi), 
while that of other models has the liner form /3o + FARijf3i + Til32 + Dij(52, + Vi. Hence the scale of the 
estimated coefficients of GM are different from those of other models. However, the signs of estimated 
coefficients are the same over all models. The resulting signs are intuitively natural since the PLP is 
expected to be decreasing as the distance between the spot and the nearest station gets large or the 
nearest station gets distant from Tokyo station. Moreover, in HNERVE model, the estimated value 
of 7 i is % = —1.82, which is consistent to the observation from the left panel of Eigured) Using the 
result of Theorem [H the asymptotic standard error of 71 is 0.492, so that 71 seems signihcant. 

We here consider to estimate the and price of a spot with floor-area ratio 100% and distance from 
1000m from from the station i, namely /ij = /3o -|- /3i + (52Ti + (S^ + Vi of HNERVE, HNERRD, NER 
and JN models, and /r* = exp(/3o -t- ,5i + f52Ti F (S^ + Vi) of GM model. In Eigure [21 we provide the 
predicted values of of each model. Erom the hgure, we can observe that all hve models provides 
relatively similar predicted values, and the predicted values tend to decrease with respect to the area 
index. This comes from the effect of Tj since Ti increase as the area index increases. 

We hnally calculate the mean squared errors (MSE) of predictors. In JN model, the consistent 
estimator of MSE cannot be obtained without any knowledge of grouping of areas (stations) as shown 
in Jiang and Nguyen (2012). Eor GM models, the second-order unbiased estimator of MSE is hard to 
obtain. Thus, we here consider the MSE estimator of HNERVE, HNERRD and NER models. We use 
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the analytical estimator given in Theorem 0] for HNERVF and NER, and the parametric bootstrap 
MSE estimator developed in Kubokawa, et al. (2016) is used for HNERRD with 1000 bootstrap 
replication. We found that the estimated MSE of HNERRD model is greater than 700 for all areas, 
while the estimated MSE of HNERVE and NER models are smaller than 20. The estimated value of 
shape parameter in dispersion (gamma) distribution in HNERRD is close to 2, which may inflate the 
MSE values. The estimated values of root of MSE (RMSE) of HNERVF and NER models are given 
in the right panel of Figure [TJ It is revealed that the estimated RMSE of HNERVF is smaller than 
that of NER in many areas. In particular, this is true in 37 areas among 52 areas. Especially, in the 
latter areas, it is observed that the amount of improvement is relatively large. 




Figure 1: Plot of OLS Residuals Against Distance Dij (Left) and Estimated root of MSE (RMSE) in 
HNERVF and NER models (Right). 


Table 3: The Estimated Regression Coefficients in Each Model 


model 

/3o 

/3i 


/33 

HNERVF 

42.31 

2.81 

-3.56 

-0.661 

HNERRD 

37.72 

3.88 

-3.24 

-0.960 

NER 

33.35 

6.58 

-3.18 

-0.832 

JN 

37.01 

3.41 

-2.59 

-3.19 

CM 

3.63 

0.168 

-0.122 

-0.039 
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Figure 2: Predicted Values of jii in Each Model. 


5 Concluding Remarks 

In the context of small-area estimation, homogeneous nested error regression models have been ex¬ 
tensively studied so far in the literature. However, some real data sets show heteroscedasticity in 
variances as pointed out in Jiang and Nguyen (2012). To extend the traditional homogeneous nested 
error regression models, Jiang and Nguyen (2012) and Kubokawa, et al. (2016) have proposed het- 
eroscedastic nested error regression models, respectively. The drawback of the two models is the 
normality assumption is required for the response values. To overcome the problem, we have proposed 
the structure of unit-level heteroscedastic variances modeled by some covariates and unknown parame¬ 
ters and suggested heteroscedastic nested error regression models without assuming specihc underlying 
distributions. In terms of the variance modeling with covariates, the generalized linear mixed models 
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are also popular tools, but it requires somewhat strong parametric assumptions. Therefore, HNERVF 
model has clear benefits in real application. Conversely, one drawback of HNERVF is probably the 
structure of heteroscedastic variances specified by some covariates and unknown parameters, while two 
heteroscedastic models by Jiang and Nguyen (2012) and Kubokawa, et al. (2016) do not requires such 
a specific structure. However, the heteroscedastic variances can be often modeled by some covariates 
as in the real data application given in Section 14.31 
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Appendix 


A.l. Proof of Theorem 1. Since Pi,... ,y^ are mutually independent, the consistency of 7 
follows from the standard argument, so that and (3 are also consistent. In what follows, we derive 
the asymptotic expressions of the estimators. 

First we consider the asymptotic approximation of — r^. From Q, we obtain 


m rii 


^2 

T — T = 


iUij — ®ij/3oLs)^ “ 


— T 


i=l j=l 
m Ui 


^ E Z ^ E E 4(1) 

2 = 1 j = l 2=1 j = l 


m rii 


OLS “ /3) + Opi'-f - 7 ) + Op(/3oLS “ /3) 

i=l j=l 

^ m ^ m TLi 

“ E ''I* " E E - T') + Op("i"^/^) + 0p(7 - 7 ), 


m 


2=1 


2 = 1 j = l 


(26) 


where uu = mN ^ Yl'jLi |(2/7 “ “ 4'} used the fact that /Jqls ~ P — Op{m ^/^) 

and N~^ Yl^LiiVij — ^ijP)^ij = Op{m~^^‘^) from the central limit theorem. 

For the asymptotic expansion of 7 , remember that the estimator 7 is given as the solution of the 
estimating equation 


1 

N 


m Ui 

EE 

1=1 j=i 


[vij - Vi- {Xij - a3i)'3oLs} - 4(^b - 2n. ^Zij -h n- ^Zi) 


= 0 
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Using Taylor expansions, we have 


^ 2 rn rii 

{Vij Vi i^ij ^i) ^iji^ij ^i) {f^OLS f^) 

i=l 2=1 j=l 

^ m Tii 

“ 2n-^Zij + n-Hi)z'ij{j - 7) + 0^(7 - 7) + 


i=i j=i 


where 


Tli 

U2i = mN~^ y] \^{yij - Vi - {xij - Xi)'(3^ Zij - cjfj{zij - 2n~^Zij + n~'^Zi) 


j=i 

From the central limit theorem, it follows that 


1 

N 


m rii 


y~l [vij -yi- {xij - XiYP} Zij{xij - Xi)' = Op{m 


i=l j=l 

SO that the second terms in the expansion formula is Then we get 

-1 


1-1 = 


N 


m rii 


m 


Yl Y ~ ^Zi)^ij I X] + Op{j - 7) + Op{m 

i=i j=i 


2=1 


Under Assumption (A), we have 

m rii 


YY^lD^^l ~ = 0{m). 

i=l j=l 


From the independence of ... ,y^ and the fact E{u2i) = 0 , we can use the central limit theorem 
to show that the leading term in the expansion of 7 — 7 is Op{m~^/‘^). Thus, 


1-1 = 


N 


m Hi 


m 


Y Y 4(1 )+ n- ^Zi)z'ij I y] U2i + Opim 

i=i j=i 


2=1 


Using the approximation of 7 — 7 and 1 — 1 = Op{m we get the asymptotic expression of 
from (I 26 |) . which establishes the result for and 7. 

Finally we consider the asymptotic expansion of /3 — / 3 . From the expression in ([ 3 ]), it follows that 


P - P = P - (3 + Y^ ( 7 ^ - 7 ) + - 7 ) + Opi?"^ - T^). 


V^T 


Since 


— E - J 

g^2^‘ - 


Y 

d-y, 


® f ) • • • 5 9) 
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(27) 


for Wif^s) = diag{a^^^jZiis ,.. ■, , we have 


where 


d 

dr'^ 


(3={X'T,-^X) ' - 13) , 


V. 2=1 


= {X'S-‘x)-‘ (f^X'E-‘W,(.,SriXi) (/3;, -/3) 


^ 2=1 


s = l...,q, 


/3‘ = E V.S-> £ X'E-> J„.S->y„ 

\ 2=1 / 2 = 1 

/ m \ m 

= E . = 1 ,..., g. 


^ 2=1 


2=1 


Under Assumption (A), we have (3^ — (3 = Op{m~^'‘^) for a € {r, 71 ,..., 7 ^}, whereby (3 — (3 = 
Op{m~^/‘^). Since 7 — 7 = Op{m~^/'^) and t'^ — = Op{m~^^'^) as shown above, we get 

m 

3 - /3 = (X'S”iX)"' XiJ:-\y, - Xi(3) + Op(m-V2), 

2=1 

which completes the proof. 


A 2 . Proof of Corollary [T]. Let 0 = ( 6 * 1 ,..., 0p+q+i)'= (/3', 7 ', r^)'. Note that ^ = Ij ••• jP + 
q + I does not depend on ..., ..., and that y^,..., y^ are mutually independent. 

Then, 


7 E 




E<‘ El‘ 


7=1 


7=1 


Vi 




E 'ijjf'iljf 




+ 


m 




2 




where is the (fe,/)-element of fl and we used the fact that El'tpj'^ly^] = El'ijj^j'^] = 0 for j / i. 
Hence, we get the result from the asymptotic approximation of 9 given in Theorem [TJ 


A3. Proof of Theorem [2l We begin by deriving the conditional asymptotic bias of 7 . Let 7 be 
the solution of the equation 


^ m Ui 

^( 7 ; /3) = E E [{y^j ~y^~ ^*7 - 


i=l j=l 


= 0 


with afj = a‘^[z[-^). For notational simplicity, we use F instead of i^( 7 ; (3) without any confusion and 
Fr, r = !,...,(/ denotes the r-th component of F, namely F = (Ti,..., Fq)'. Define the derivatives 


F{a) and F^^ah) by 


F 


_ dF 


Fr(ab) 


d‘^Fr 

dadb' 
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It is noted that Fi^{i 3 -y) = 0. Expanding F{^-, /3ols) = 0, we obtain 

0 = F + F (-y)(7 — 7) + F(/ 3 )(/ 3 ols “ / 3 ) + 2^1 + 2^2 + Op(m ^), 
where F = (tsi, • • •, tgq), s = 1,2 for 

hr = (7 - 7)'^r(77)(7 “ 7 ), i2r = (3oLS “ / 3 )'^r(/ 3 / 3 ) (3oLS “ / 3 )- 

It is also noted that 

^ m rtfc 

^kj(i)i^kj ‘2‘'nf. Zkj + rif^ ^k)zk. 


-^( 7 ) 




fc=ij=i 
m rij^ 


-^(/3) A7 ^ ^ ^ ^ Vk {^kj ®fc) ®fc) ) 


N 

k=l j=l 

SO that Fis non-stochastic. Thus we have 

^[7-7|yi] = !^E[F{-f, (3)\yi] + E F(^)(3ols “/3) Vi + + ^^[*2|t/j| + Op(m"^). 

In what follows, we shall evaluate the each term in the parenthesis in the above expression. For the 
first term, since y^,..., are mutually independent and E{u 2 i) = 0, we have 

E[F{'y,(3)\yi] = ^U2i. 

For evaluation of the second term, we define Z^r = diag( 2 :fcir.,..., Zkn^,r), where Zkjr denotes the r-th 
element of Then it follows that 


E r(/3)(3oLS “ P) 


Vi 


N 


E® 


fc=i 


iVk ~ ^kPyEkZj^rFkXkiPoi^s — P) 


Vi 


2 p "I 2 r 

E {yk — XkPyEkZkrEkXkiPohS ~ P) Vi — j^iVi — XiP)'EiZirEiXiE Pols — P 


N 

k=l^k^i 

Noting that it holds for i = 1,... ,m and k ^ i 


Vi 


E 

we have 


(y, - X,/3)(y, - Xfc/3)' y, = l{,=A,}Sfc, F[3ols - P\y^] = (^'^) ' X[{y, - X,/3), 


E 


Vi 


(Vk ~ ^kPyEkZkrEkXkiPoLS ~ P) 

m 

= ^tr {EkZkrEkXkiX'X)-^X'f,E [(y^ - X,/3)(y;, - Xfc/3)'|y,] } 
£=1 

= tr {(X'X)-iX'fcEfcFfcZfc,FfcX4 , 
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which IS 0{m and 


N 


{Vi- Xil3)'EkZkrEkXkE = Op{m ^). 


Thus, we get 


^r{l3){PohS ~ P) 


Vi 


m rik 


m 


^X'^EikEkZkrEkXk] -\- Op{m 




( 28 ) 


k=l j=l 


where the leading term is 0{m ^). For the third and forth terms, note that 

m 71]^ 


E 


r(~fy) 


11V I t'/g 2 '' 

y y ^k-i( 2 )i^kj ~ ^kj E Tly. Zk)ZkjZkjr -^r(/3/3) ~ Tr ^ Xj^Ej^ZkyEf^X 


N 


k=l j=l 

which are non-stochastic. Then for h = 1,... ,q, 

^ m Tik 


k=l 


^ ^ ^kjr^kj{2)i^f^j ^k El^^Zkj T OpijJl ), 


k=lj=l 


E[t 2 r\yi] = {X'kEkZkrEkXkVoLs) +Op{m 


N 

k=l 

for VoLS = {X'X)~^X''EX{X'X)~^, where we used Corollary [T] and 

(P OLS ~ P)(PoLS ~ PYlVi =XoLS+ Op{m ^), 
which follows from the similar argument to the proof of Corollary [TJ Thus we obtain 

m rik 

N 


(29) 


^ m rik 

^^j^kj{2)^^^j ^k Op(^lTl ), 


k=lj=l 


E[t 2 \yi] = ^ {tr {X'kEkZkrEkXkVoLs)}^ + Op{m ^), 


k=l 


where {ar}r denotes the g-dimensional vector (oi,... ,aq). Therefore, we have established the result 
for 7 in (fTSl) . 


We next derive the result for r^. Let 

1 


nk 


T^ = 


XkPYiyk - XkP) - E <■ 


k=l 


f=l 


Using the Taylor series expansion, we have 

-2 


2 ~2 


T =T + - l) + -{l - l) 


(wJ 


+ - P) + -j^iPohS ~ PY dpdp'J - P) + Op{m ^) 
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where we used the fact that /d^df3' = 0. The straight calculation shows that 

_ 1 ^ ^2 _ 02-2 _ 1 ^ _^2 _ _ 2 ^ ^ 

d-y ~ iV dydy' ~ Af Qf^Qj^' ~ * *’ 

' k=l j=l ' ' k=l j=l ^ k=l 

which are non-stochastic. Thus we obtain 

E[t‘^ - r^lyj = E[?2 _ r^lyj + Ely- yly^] + ^tr | E [(7 - 7)(7 - 7)']^^] 


+ E 




(Pols ~ P) 


Vi 


1 r/ 92:^2 \ 


+ 2 *^ 


E 


V 9 / 39/37 

= + Br2{yi) + Br'iiyi) + BriiVi) + Br^iyi) + Op(m"^) 

From the expression of , it holds that 


(/^OLS “ P){Po'LS ~ P)'\yi f + Op{m ) 


Brliyi) ^ ^ 


UkT 


k=l,k^i 


+ ^ I (y, - XiOYiy, - X,P) - 4 1 


— r 


\ 2 I ^ 2 2 


N 


/I «'\Z( i z 

= 7 “ A7n -T = —uii, 


m N 


m 


for uii dehned in ([8]). Also, we immediately have 


m rik 




k=lj=l 


For evaluation of BT-^{yY)., note that 


9^2 ^ 

~^~~N 


J2x'k{yk-XkP). 


k=l 


Similarly to (f28ll . we get 


BrAiyp — ^ ^ E 


k=l 


iy,-X,pyxSoLS-P) 


yi 


N 


tv{{X'X)-^X'f,'SkXk} + Op(m-i). 


k=l 


Moreover, Corollary [T] and (l2^ enable us to obtain the expression of Brsiyi) Br^iyp, whereby 
we get 




^ m rik 

= m-\ii - — '^^^lj(i)z'kj {b^iyP - b^} 


+ br, 


k=l j=l 


which completes the proof for in (|13p . 
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We finally derive the result for (3. By the Taylor series expansion, 


3-/3 = /3-/3 + ^ (7^ - 7) + ^)’ 


since 


a /3 


/ / dp 


ITT (0-0)(0-0) ITT =Op(m ), 


a(/) 




from 8(3/d(f} = Op{m as shown in the proof of Theorem [TJ From (I27p . we have 


a 




q / m 


X'S-iX) ' ^ ^ { (/3; - /3) a - 7.) - (/3 - P){ls - 7«)} 


S=1 \fc=l 


and 


a. 

lar^ 


/3) a-r2) = (X'E-IX) ' {(/3: - (3){t‘^ - r^) - (/3 - (3){t‘^ - 


^fc=i 


Let = -E[(/3^^ — /3)a “ 7s)] ^/3*r = ^[{Pt ~ P){P ~ 'r)]- Then it can be shown that 

E[(P*r -P){r- r)\yi] = + Op{m~^), £’[( 3 *^ - P){% - 7 *)!^*] = ^ 0 *t + Op{m~^), 

which can be proved by the same arguments as in Corollary [1] Thus from Corollary [1] and the fact 
that 


E 


P-PlVi =(X'5]-^X) ^X'^~Hyi-Xi(3), 


we obtain the result for (3 in (fTB|l . 

A4. Proof of (lisp . From the expansion of //*, we have 


E [{(2i - mf] = E 


a/r^V 

86 


(6-6) 


+ lu^ + ^U 2 , 


where 


Ui=E 

U2=E 


lV(g-«)(g-») 77 !L 


aa, 

(6 - ey 


/ 8"^^ 

\8e8e' 


e=e* 


\8986' 

(6-6) 


e=e 

2 


(6-6) 


It is noted that 


p+q+lp+g+lp+g+1 

f'l = E E E ® 

j=l k=l £=1 


a/iA / 8‘^iJ.i 
86jj \89k86e 


e=e* 


* ) i^j ~ - 6k){6i - 


p+g+lp+q+lp+q+l 

= £ £ £ 

j=i k=i l=i 
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and 


\Uijki\ < E 


< E 


dfJ.i 

de, 

djii 

89. 


f 

\d9kd9i 

/ d'^Jli 

[d9kd0i 


e=e* 


0=0” 


(Oj - 9j)(6k - 6k)(9i - I 


1/4 


E 


(9j - 9j){9k - Ok)(Be - 0^) 


4 / 3 ' 


3/4 


(30) 


using Holder’s inequality. Since both d^i/dOj and 8“^^i/dOkdOi are linear functions of i/j, the first 
term of (130^ is finite under Assumption (A). Moreover, from Theorem [H it follows y/m\6j —9j\ < C{y) 
for some quadratic function of y, so that the second term in ()30p is also finite. Hence, we have 
Ui = o{m~^). Similarly, we also obtain U 2 = o{m~^). Therefore, using Corollary[Tl we have 



since c{yj) is fourth-order function of y^ and 8\iil8Q is a linear function of y^, which completes the 
proof. 


A5. Derivation of R^ii{cj}, k). Since y^ given Vi,ei is non-stochastic, we have 


86 


= E 


E 


{6 - e)wi 


(e-e)wi 


86 


= E 


= Rzii{4>) + o{m~^). 
It is noted that E{wi) = 0 and 


Vi,ei 

+ E 


= E 


E(e-e\y,Y 


'>?(«!)' I m 


+ E 


86 




+ o{m ) 


rii 


rii 


E [{yij - x[j(3)w^] = E [{vi -)- eij)wi] = - 1 ^ Xijaij = 0. 


(31) 


v/=l 


/=! 
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Using the expression (fT^ and (fT7)l . it follows that 


E 


V f ^ 


ba^Vi 


dp 


Wi 


IJ Xij 




E 




dj 

djlj 

dr 


^X'^-^E[{yi-XiP)w,]=0 

nk \ 

(EE ^kh{l)^khZ'kh] M2ij{p,K) 


~ I 'y ^ ^ 

i=i 

rii 


m rik 


J=1 


\k=l h=l 


Wi 


rii 


= m 


i=i 


-Ti(7)'T2(7)M2*,(0,k) , 


where 


M2ij{p, k) = E [u2i{yij - x'^jP)w^] , Muj{p, k) = E [uuiyij - x[jP)wi] . 

To evaluate Muj and M2ij, we first prove the following result for fixed j, k,£ £ {1,. .., rii}. 

E[{vi + eij){vi + eik){vi + £ii)wi] = t ^(3 - k „) + Keafjl{j=k=iy + - l{j=k}) 

+ crij{'^{j=ejtk} — l{i=u) + f^iki^{k=eiij} - l{fc=£}) 


(32) 


To show (|32p . we note that the left side can be rewritten as 


rii 

~ "Hi [(uj + £ij){vi + £ik){vi + £ie)vi\ + XihE [{vi + £ij){vi + £ik){vi + £ie)£ih] (33) 

h=l 

from the definition of Wi. Using the fact that en,... ,£ini and Vi are independent, the first term in 
HMD is calculated as 

E [vf + {^ij^ik + £ij£ii + ^ik^ipvf] = KyT^ + {crfjl^j=k} + (^ij^{j=l} + (^ik^{k=l}) ■ 

Moreover, we have 


E [(uj + £ij){vi + £ik){vi + EipEih] — E [sihi^ij + £i£ + £ik)Vi + £ij£ik^ii£ih\ 

= i)-{h=j} + + l{/i=u) + klef^ih^{j=k=t=h} 

2 / 2 2 2 \ 

T ^ih \Eij^{j=k^l=h} T 'Xijl^j—i^k=h} T ^ik^{j=h^k=£}) j 

whereby the second term in (j33p can be calculated as 


rV 


[3r^ + KsCr^ 


i] 


l{j=fc=£} + + (yijl{j=ejkk} 


T cr?j,l 


ik^{k 


=f.^3}\ 


where we used the expression Xih = Then we established the result (l3^ . From (f32]) . we 

immediately have 


rii 

'^E[{vi + £ij){Vi + £ik){Vi + EipWi] 

e=i 


= ^ [niT^{3 - Ky) + crjj{Ke 


= E[{vi + £ij){vi + eikfwi]. 


3)l{i=fc}] 
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Now, we return to the evaluation of Muj and M 2 ij- It follows that 


Tli 

Muj{cf), = [iVih - x'ihfBfivij - x'ijl3)wi] 

^ h=l 

= miV“^r7^"V^|nir^(3 - Ky) + a^j^Ke - 3)| 

and 


m 

M2ij{(l),K) = [{Vi+£ih- {Vi + Si)}^ {Vi + £ij)Wi] 

^ h=l 

TTz r 

— ^ ^ ^ih \ E ^ih) ^ ^ ^ [(^2 

^ h=l ^ k=l 

Tl-i TL'i >. 

E [(Uj + £ij)ivi + Sik){Vi + £i£)Wi] y 
k=i e=i ^ 

Using the identity given in (|32l) . we have 

rii 

k) = mN-^T^T]-^ J2 Zih[(Tl{Ke “ 3)(l{j=h} - 2n“H{j=;,} + n“2)| 
h=l 

= mN-\‘^r]-^n-‘^{ni - lf{Ke - 3)a^jZij, 
which completes the result in (I2UI) . 


A6. Evaluation of i?32i(0)' Since given Vi and ej is non-stochastic, we have 


-R32i(^) — -^E 


{d - ey 


f d'^m 

I deoe' 


e=e* 


{0 - e)wi 


= -E 
2 


E 


{d-ey 


f d^m 

I ddde' 


9=0* 


{0 — 6)wi 



■ 

Vi, Ei 



_ 


= —tr { yiE 


/ d'^yi 

\dede' 


9 = 9 * 


Wi 


+ o{m ^)E 


tr \ ciVi) 


/ d‘^yi 

V d9de' 


9=9* 


Wi 


where we used Corollary [T] in the last equation. Note that 


d'^yi 


dGde' 


_ d'^yi 
9=9* dOdO' 


p+q+l 


7 + E 


k=l 


d^yi 


dedG'dOk 


ek=et* 


(34) 


where 0'^ is an intermediate value between 0^ and 9k- Further note that the third order partial 
derivatives of yi is a linear function of yi, so that the second term of R^ 2 i is o{m~^). Similarly, it 
follows that 


E 


/ d'^yi 

I dGdG' 


e=9* 


Wi 


= E 


dGdG') 


+ 0(1) 


o(l), 


since the second order partial derivatives of yi is a linear function of yij — x'-(3 and the identity (f3T]l . 
Therefore, we hnally get i?32i(0) = o{m~^). 
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